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Abstract 

Abstract Many online collaboration networks struggle to gain 
user activity and become self-sustaining due to the ramp-up prob¬ 
lem or dwindling activity within the system. Prominent examples 
include online encyclopedias such as (Semantic) MediaWikis, Ques¬ 
tion and Answering portals such as StackOverflow, and many others. 
Only a small fraction of these systems manage to reach self-sustaining 
activity, a level of activity that prevents the system from reverting to a 
non-active state. In this paper, we model and analyze activity dynam¬ 
ics in synthetic and empirical collaboration networks. Our approach is 
based on two opposing and well-studied principles: (i) without incen¬ 
tives, users tend to lose interest to contribute and thus, systems be¬ 
come inactive, and (ii) people are susceptible to actions taken by their 
peers (social or peer influence). With the activity dynamics model 
that we introduce in this paper we can represent typical situations of 
such collaboration networks. For example, activity in a collaborative 
network, without external impulses or investments, will vanish over 
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time, eventually rendering the system inactive. However, by appro¬ 
priately manipulating the activity dynamics and/or the underlying 
collaboration networks, we can jump-start a previously inactive sys¬ 
tem and advance it towards an active state. To be able to do so, 
we first describe our model and its underlying mechanisms. We then 
provide illustrative examples of empirical datasets and characterize 
the barrier that has to be breached by a system before it can become 
self-sustaining in terms of critical mass and activity dynamics. Addi¬ 
tionally, we expand on this empirical illustration and introduce a new 
metric p —the Activity Momentum —to assess the activity robustness 
of collaboration networks. 


1 Introduction 

One of the major problems faced by both, new and existing online social 
and collaboration networks—such as Facebook or StackOverflow—revolves 
around efficiently identifying and motivating the appropriate users to con¬ 
tribute new content. In an optimal scenario, this newly contributed content 
provides enough incentive for other users to contribute, triggering further 
actions and contributions. Once such a self-reinforced state of increasing ac¬ 
tivity is reached, we can say that a system becomes self-sustaining, meaning 
that sufficiently high levels of activity are reached, which will keep the sys¬ 
tem active without further external impulses. For example, when looking at 
well-established collaborative websites, such as StackOverflow or Wikipedia, 
we already know that at some point in time, these systems have become 
self-sustaining (in terms of activity), evident in their steady growing number 
of supporters and overall activity. 

However, these self-sustaining states are neither easy to reach nor guaran¬ 
teed to last. For example, Suh et ah [HI] showed that the growth of Wikipedia 
is slowing down, indicating a loss in momentum and perhaps even hrst evi¬ 
dence of a collapse. Moreover, we typically lack the tools to properly analyze 
these trends in activity dynamics and thus, can not even perform such sim¬ 
ple tasks as detecting self-sustaining system states. Therefore, we argue that 
new tools and techniques are needed to model, monitor and simulate activity 
dynamics for collaboration networks. 

The high-level contributions of this work are two-fold. First, we introduce 
a model that is capable of simulating activity dynamics for online collabora¬ 
tion networks. Second, we describe in detail how to £t the model to empiri- 
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(a) Intrinsic Activity 
(blue) and Peer 
Influence (yellow) at 
time to 


(b) Intrinsic Activity 
(blue) and Peer 
Influence (yellow) at 
time ti 


(c) Intrinsic Activity 
(blue) and Peer 
Influence (yellow) at 
time t2 


(d) Intrinsic Activity 
(blue) and Peer 
Influence (yellow) at 
time ta 


Figure 1: Intrinsic Activity and Positive Peer Influence. Activity 
dynamics in collaboration networks, represented by users as nodes, collab¬ 
oration as edges and activity as node size (Figure (a)), are based on two 
opposing principles. The Activity Decay Rate postulates the loss of intrinsic 
activity (blue color of nodes) per user over time. In contrast, the Peer Influ¬ 
ence Growth Rate follows the intuition, that users in collaboration networks 
are (positively) influenced by their peers (yellow color of nodes) where more 
active peers exercise a higher influence than less active peers. We initial¬ 
ize the network at time to with random intrinsic activities. Nodes with a 
green halo at times ti to t^ represent users that exhibit a gain in their overall 
activity between two iterations and tn+i, as the exercised positive peer 
influence is higher than the intrinsic loss of activity. Analogously, red halos 
represent decreases in overall activity. At hrst, very central (high degree) 
nodes with smaller activity values manage to increase their overall activity, 
while very active central nodes already start to lose activity. After t^ or 
more iterations, due to overall decreasing activities and hence, decreasing 
peer influences, all nodes in the collaboration network eventually start to 
lose activity and inevitably converge towards zero activity. 


cal datasets, simulate trends in activity dynamics and interpret our hndings. 
The proposed model is based on the formalism of continuous deterministic 
dynamical systems—meaning that activity is modeled by a system of coupled 
non-linear differential equations. Each user of the system is represented by 
a single quantity (the current activity), and the social ties between users de- 
hne the coupling of variables. In general, when using dynamical systems on 
networks, we dehne the (micro-)behavior of each user to observe and gather 
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new insights into the (niacro-)behavior of the system. For a more detailed 
introduction to dynamical systems see Section]^ and Newman [65]. For sim¬ 
plicity, we do not take individual differences between users into account—the 
dynamics and its parameters are the same for each user in the population. 
This allows us to conhgure the model with a single parameter, which is a ratio 
of the following two parameters, representing two basic activity mechanisms 
(cf. Figure in online collaboration networks: 

(i) Activity Decay Rate A, which postulates how fast users lose interest to 
contribute, 

(ii) Peer Influence Growth Rate /i, postulating to what extent users are 
influenced by the actions taken by their peers. 

A hrst analysis of the model shows that activity dynamics in collabora¬ 
tion networks have an obvious and natural hxed point—the point of com¬ 
plete inactivity—where all contributions of the users have seized. However, 
by slightly manipulating the parameters in our model we show that it is 
possible to destabilize the hxed point, resulting in a potential increase of 
activity. We then outline the process of calculating the Activity Decay Rate 
and Peer Influence Growth Rate for existing collaboration networks, simu¬ 
late their corresponding activity dynamics and expand our understanding of 
critical mass—via the notion of System Mass and Activity Momentum —in 
collaboration networks by interpreting our hndings. 

The remainder of this paper is structured as follows: In Section we 
introduce and examine our model analytically. We then continue with the 
model illustration by simulating activity dynamics for a synthetic dataset 
and discuss different evolution scenarios of our parameters and their implica¬ 
tions. In Section]^ we outline the process of applying our model on empirical 
datasets. In Section we introduce the notion of System Mass and Activity 
Momentum, review related work in Section and summarize our hndings 
and discuss limitations and implications for future work in Section 


2 Modeling Activity Dynamics 

We model activity dynamics in an online collaboration network as a dynam¬ 
ical system on a network. Hereby, the nodes of a network represent users of 
the system and links represent the fact that the users have collaborated in 
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the past. We represent the network with an n x n adjacency matrix A, where 
n is the number of nodes (users) in the network. We get Aij = 1 if nodes 
i and j are connected by a link and Aij = 0 otherwise. Since collaboration 
links are undirected, the matrix A is symmetric, thus A^j = Aji, for all i and 
j. We denote the total number of links in the network with m, and thus we 
have 2m = J2ij Aj- 

We model activity as a continuous real-valued variable evolving on node 
i of the network in continuous time t. The general time evolution equation 
can be written as follows (see also Newman [65]): 


Peer Influence 



( 1 ) 


Intrinsic 


Influence of j on i 


Activity 
Evolution of i 


where /(oj) specihes the intrinsic activity evolution of node i and g{ai, aj) 
describes the influence of neighbor j on node i. To simplify, we assume that 
the intrinsic activity dynamics as well as the influence of node neighbors are 
the same for each node i and for each neighbor pair (^, j). This means that 
we have a single intrinsic activity function /(a*) for all nodes i, as well as a 
single peer influence function g{ai,aj) for all node pairs 
In addition, we make the following assumptions: 

Intrinsic Activity Decay. Without external incentives or without pos¬ 
itive influence from their social connections, each user has a tendency to 
slowly reduce activity. For example, people slowly lose interest to partici¬ 
pate in collaborative networks or exhaust their resources. An observation 
that specihcally reflects this inherent exhaust of activity over time has been 
made by Danescu-Niculescu-Mizil et ah [2H| for different online communities. 
We model this situation by using a linear function for /(a^): 


/(oi) = -Xtti, A > 0 


( 2 ) 


We call parameter A the Activity Decay Rate —the rate at which users 
reduce their activity per unit time, given a complete absence of other (pos¬ 
itive) incentives. The specihc form of /(a*) results in an exponential decay 
(aj(f) = aj(fo)e“^*, with aj(fo) being the initial activity of node i at time 
to) of activity without any external influence. Thus, without other positive 
impulses the activity of every user will decay over time (see Figure [2(a)[). 
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(a) Intrinsic Activity Decay (b) Extrinsic Peer Influence 

Figure 2: Intrinsic Activity Decay is the rate at which users reduce their 
activity per unit time and is represented as a linear function in the form of 
/(a) = —Aa, which results in an exponential decay in activity that converges 
towards zero. Extrinsic Positive Peer Influence describes to what extent 
users are influenced by the actions taken by their peers, and is represented 
as a monotonically increasing function of a users activity in the form of 
g{a) = (qa)/\/al + a^. It naturally saturates at Maximum Peer Activity 
Flow q as activity reaches inhnity and, in our simulations, can never be 
negative per dehnition (see Equation]^. When the user activity passes the 
point of the Critical Activity Threshold Oc, peer influence gains notable weight 
and influences neighbors to “do something” (become active). 

Positive Peer Influence. People tend to copy their friends [23l |5j l86] . 
meaning that if neighbors of a node i are active they will positively influ¬ 
ence node i to become active as well. The magnitude of the influence, or 
the “speed” at which the influence is transferred from an active node to its 
neighbors will depend on two quantities (cf. Figure]^: 

(i) Critical Activity Threshold Oc, which represents a soft threshold of ac¬ 
tivity that marks the point when users have an activity potential, that 
notably exercises influence on their peers. Note that influence is exer¬ 
cised at all levels of Oc- However, once Oc is reached, the influence is 
determined as “notable” (e.g., a level of activity that is above the aver¬ 
age activity per user) for the corresponding peers. Hence, this critical 
level of activity is a system-dependent quantity. One can imagine that 
in a system with high user activity (e.g., a large number of changes per 
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user) the critical activity is higher than in a system with lower levels 
of activity. For example, in the latter case the users will sooner notice 
a neighbor who became active recently. We model the Critical Activity 
Threshold as a continuous threshold. Meaning that active users will 
always influence their neighbors, but will exercise more influence after 
they have passed the critical level of activity. 


(ii) Maximum Peer Activity Flow q represents the maximum activity flow 
per unit time from users to each of their neighbors. This maximum flow 
is reached as user activity approaches inhnity. However, substantial 
amounts of the maximum flow are already reached whenever the user 
activity passes the level of the critical activity Oc- 


Thus, to model peer influence, we resort to a monotonically increasing 
function, where more active neighbors are always more influential than less 
active ones. Additionally, the function g{aj) saturates for sufficiently large 
values of activity, inducing a natural limit on how much users can be influ¬ 
enced by their neighbors. We model this by setting g{ai, Oj) = g{aj) and 
choosing an algebraic sigmoid function with; 

, . qa 

9[aj) = / ac > 0. (3) 


Peer influence can also be analyzed in terms of the growth rate of g{a), 
in the form of the derivative dg/da of the function g{a). After simplifying 
and rearranging, the growth rate can be calculated as: 

dg qal 

da {al + a?YC' 

In the limit of large activity a the derivative of g{a) tends towards zero, 
thus peer influence saturates at q. On the other hand, the maximum change 
in influence is observed when a = 0—neighbors who suddenly become active 
will be noted most, in terms of activity, by their peers. 


2.1 Dynamics Equation 


With /(oj) and g{aj) dehned, the activity dynamics equation becomes: 


doi 

dt 


-Aa,- 





(5) 
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The different parameters of the equation have dimensions. For example, 
Qi and Qc have activity as unit, t has seconds as unit, A is a rate and has 
inverse seconds as unit, and q has activity per second as unit. Further, the 
equation has three free parameters, which span a huge parameter space that 
is difficult to explore in detail. Therefore, our first step is to simplify the 
equation and express it in a dimensionless form, which typically also has 
a smaller number of parameters as only their relative ratios, rather than 
their absolute values, are of importance. Another advantageous side-effect 
of a dimensionless formulation is that it eliminates the absolute values of 
the properties under investigation, in our case user activity, which can be 
difficult to interpret. 

There are many ways to eliminate dimensions from such equations [53] . 
A useful heuristic is to try to first eliminate the dimensions from the most 
non-linear term in the equation, which in our case is g{aj). Thus, we begin 
by defining a relative activity x as the ratio between the activity a and the 
critical activity ttc- 


a 


( 6 ) 


X = — 


The variable x is dimensionless now, and it is easy to interpret. For 
example, the fact that x = 5 means that users exercises a strong influence 
on their neighbors, since the level of activity is five times the critical activity 


Qc- In fact, the influence in this case is g{5ac) = (5g)/v^ ~ 0.98^. On the 


other hand if x -C 1 (e.g., x = 0.1), this then means that the influence of 
users on their neighbors is much smaller as ^'(O.lac) = (O.lg) /a/1.01 ~ O.lg. 

By rearranging, substituting x for a and simplifying (oc cancels in the 
second term) our activity dynamics equation reduces to: 




To eliminate the dimensions from the second term we divide both sides 
with q: 




The term q/ttc is the growth rate of the function g{a) evaluated at zero: 


da 



(a2 + a2)3/2 
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a=0 


a=0 


(9) 










This quantity gives the rate at which the influence on the peers grows 
if the user activity experiences a small displacement from the point of zero 
activity. Let us now define this quantity as Peer Influence Growth Rate and 
denote it with /i = q/oc since this will simplify the algebra and will make 
the model interpretation more intuitive. Thus, the last equation can then be 
written as: 


1 dxi 
fi dt 


A 

--Xi 

/i 




Xh 


i + h 


( 10 ) 


Finally, we also want to scale time t and express the equation in terms 
of dimensionless time r. This last reformulation will further simplify the 
equation and allows us to interpret and compare activity dynamics over time 
across various systems. The latter is possible due to the usage of dimen¬ 
sionless time r to scale and compare the time evolution of different systems 
relative to each other. Let us make the following substitution; 


r = lit. 


( 11 ) 


By substituting r for t in the term on the left hand side in Equation [TO 
we arrive at the dimensionless dynamics equation: 


dxi 

dr 


A 

- Xi 

R 




Xi 


1 + 


( 12 ) 


Now, there is only one parameter in our dynamics equation, namely the 
ratio A//i. This is a dimensionless ratio of two rates: (i) The Activity Decay 
Rate A, which is the rate at which a user loses activity, and (ii) the Peer 
Influence Growth Rate /i, which is the rate at which a user gains activity due 
to the influence of a single neighbor. 

The ratio between those two rates is the ratio of how much faster users 
lose activity due to the decay of intrinsic activity (or interest) than they can 
gain due to positive peer influence of a single neighbor. For example, a ratio 
oi X/fi = 100 would mean that the users intrinsically lose activity 100 times 
faster than they potentially can get back from one of their neighbors. If we 
would set A//i = 1, it would mean that users would lose activity as fast as 
they can regain it from one of their peers. For a short description of all 
parameters of the activity dynamics model see Table [1} 
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2.2 Linear Stability Analysis 


In general, Equation 12 is a coupled set of n (n being the number of nodes or 


users in the network) non-linear differential equations, for which, in a typical 
case, no closed form solution can be found. Therefore, we turn our attention 
to the properties of so-called fixed points. A fixed point x* represents all the 
values for x* for which the system does not change in time: 


dxi 

dr 


A \ Xj 


= 0,Vh 


(13) 
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Suppose that we are able to find a fixed point x* by solving Equation 
One obvious fixed point in our model is x* = 0, meaning that x* has 
the same value for every i: x* = x* = 0, representing a simple special case: 
a symmetric fixed point. We can easily check that x* = 0 is indeed a fixed 
point since f{x*) = g{x*) = 0, and this also gives f{x*) + J2j = 0, Vh 


Table 1: Model and model parameters. The activity dynamics equation 
is in a dimensionless form and scales over relative time r. All properties, 
as well as the single parameter of the model, are briefly described under 
Properties and Parameters. 


Equation 

Name 

^ — —-Xi + ^ij Activity Dynamics Equation 

Properties 

Name 

A 

Activity Decay Rate 

q 

Maximum Peer Activity Flow 

Qjc 

Critical Activity Threshold 

t^=- 

Peer Influence Growth Rate 

T 

Relative Time Scale 

Parameter 

Name 


The ratio, describing how fast users 

A 

intrinsically loses activity compared 


to how fast they get it back from 


(one of) their neighbors. 
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We are investigating this specific fixed point, as it also has a particular 
interpretation in our model. At this fixed point all users have zero activ¬ 
ity, which means that they are completely inactive and the system is in an 
inactive or “dead” state. If the system is in such a state and no external 
incentives are provided, nothing will ever change and the system will remain 
inactive indefinitely. 

Typically, we are interested in the implications on the system if we provide 
a small enough impulse to leave such a steady (inactive) state. In our context, 
the most interesting question is if the system will move from an inactive state 
towards a state of lively activity or if it will just revert to the inactive state. 
Technically, we are interested in the stability of the fixed point. In particular, 
we want to know if the fixed point is attracting (meaning that the system’s 
activity in the proximity of the fixed point will be attracted to it) or repelling 
(meaning that the system’s activity close to the fixed point will be pushed 
away from it). 

To answer this question we linearize the functions in the proximity of 
a fixed point. We represent the value of x* close to the fixed point with 
Xi = X* + ei, where e* is sufficiently small. To simplify the calculations, we 
concentrate on the case of a symmetric fixed point, such as a;* = 0. Next, we 
perform a Taylor expansion about the fixed point and linearize by neglecting 
the terms of second and higher orders. After simplification we obtain (for 
details see e.g. Newman [55]): 



(14) 


where is the displacement of Xi from the fixed point x*. 

We can also write Equation in matrix form, which gives: 



(15) 


where I is the identity matrix and A is the adjacency matrix. 

We can solve the last equation by writing e as a linear combination of 
eigenvectors Vr of the symmetric real matrix (—(A//i)/ -|- A): 



( 16 ) 


r 
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(17) 


Equation then becomes: 

^Vr = {--I + A) = y^Cr(r)(-- + Kr)Vr, 

(Jj I Uj Uj 

V ' r r ' 

where Kr are the eigenvalues of the graph adjacency matrix A. We also used 
the fact that the matrix (—(A//i)/ + A) has the same eigenvectors as A, but 
with the eigenvalues — A//r + Kr- 

The solution of the last equation for the coefficients of the linear combi¬ 
nation is then: 

^ = (-- + Kr-)Cr(r) Cr{T) = (18) 

CLT /i 

Now, the displacement from the fixed point will decay in time towards 0 
if the exponents for the coefficients Cr(r) are all negative. Thus, we arrive at 
the master stability equation for the special case of a dynamical system that 
we dehned as: 


— — + Kr < 0, Vr, 

/i 


(19) 


Since the adjacency matrix has both positive and negative eigenvalues, 
a necessary stability condition is A//x > 0, which is satisfied by definition. 
Thus, we can rearrange Equation [T^ and obtain the following inequality: 


( 20 ) 


A 

Ki < —. 


where Ki is the largest positive eigenvalue of the graph adjacency matrix. 
Note that this inequality separates the network structure {ki) from the ac¬ 
tivity dynamics (X/fi). 

If this stability condition is satisfied, the fixed point x* = 0, in which 
there is no activity at all (“inactive” system), represents a stable fixed point. 
This also means that small changes in activity only cause the system to 
momentarily leave the (attracting) fixed point until it becomes inactive again. 

For illustration, we initialized Zachary’s Karate Club Network (cf. Fig¬ 
ures [3(^ and |3(^) with random activities between 0 and 0.1 per node and 
simulate activity with our model. If the master stability equation holds 
(Figure [3(c)[ ), activity converges towards zero. However, when invalidating 
the master stability equation (Figure 3(d)), activity converges to a new and 
permanently active fixed point. 
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(a) Zachary’s Karate Club (b) Adiacency Spectrum 
Network (ki = 6.726) 


Activity per Node over t 
( i = 120 , At = 0.001 , Ki =6.73) 



Activity per Node over x 
(^=1 , At = 0.001 , Ki=6.73) 



(c) Activity Evolution (d) Activity Evolution 






Figure 3: Illustrative example. Top Left (a): Visualization of Zachary’s 
Karate Club. The size and color of a node represent random activity values 
between 0.0 and 0.1 of the corresponding nodes (bigger and darker equals 
higher values). Top Right (b): Eigenvalue spectrum of Zachary’s Karate 
Club network. The highest eigenvalue is 6.726. Bottom (c and d): Evolu¬ 
tion of activity with random initial activities (averaged over 10 runs). Bot¬ 
tom Left (c): Activity dynamics with parameters satisfying the master 
stability condition ki < X/jj,. Each line represents one node; all activities 
converge to the state of zero activity. Bottom Right (d): Invalidation of 
the master stability condition ki < A//i, activity converges towards a new 
and permanently active hxed point. 


In practice, additional system conhgurations are imaginable. Whenever 
the ratio is below Ki, the system becomes unstable leaving the inactive state. 
However, due to the special form of the peer influence function, which sat- 
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Changes in Activity over Time 

Activif Increase Activil Variation Activii Decrease 



Init 1 2 3 4 5 6 

X (in months) 




Changes of Ratio over Time 

Activity Increase _ Activity Variation Activitv Decrease 


00 - 


CD - 



Timespans of Simulation 


Figure 4: Coupled evolution of activity and A//x. The top Figure depicts 
the evolution of activity (j/-axis) over time (x-axis; in months) for Zachary’s 
Karate Club network with synthetically created (random) activities. The 
ratios, which correspond to the activity evolutions over time in the top Figure, 
are depicted in the bottom Figure (same symbol and color), with the y-axis 
representing the value of the ratio, while the different timespans are depicted 
on the x-axis. As long as A/p < the network converges towards a state of 
immanent activity, yet decreases in activity are possible (see timespans 2 — 4 
oiActivity Variation sections in top and bottom). If A/p > ki the network 
converges towards an inactive state. 


urates for large values of activity, the system will converge towards another 
stable state of immanent activity (i.e., ratios for periods 1 — 5 of Figure]^. 

Thus, if the system is in the state where /xi > A/p, we can think of three 
different activity evolution scenarios, depending on the current levels 
of activity present in the network: 

1. If the levels of activity are lower than the ones the network converges 
towards with the new ratio, we will see an increase in activity (e.g.. 
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timespans 1 — 2 of Activity Increase in Figure]^. 

2. If the new ratio lets the system converge towards lower levels of activity 
than currently present, activity will decrease, even though ki > A//i 
(e.g., see timespans 2 — 3 or 4 — 5 of Activity Variation and Activity 
Decrease of Figure]^. 

3. Lastly, the levels of activity have already converged towards their fixed 
point and A//r is left unchanged, retaining the levels of activity from 
the past (e.g., see timespans 0 — 1 of Activity Increase in Figure]^. 

li < X/fi holds, the system is stable and activity converges towards 
the attracting hxed point at zero activity (see timespans 5 — 6 of Activity 
Decrease in Figure]^. 

Summary of system stability analysis. In order to permanently leave 
the stable state of complete inactivity we are interested in making the system 
unstable. To be able to leave the attracting force of the fixed point at zero 
activity we have the following two options: 

(i) We provide (continuous) external impulses to the system, for 
example, in the form of incentives for users to increase their activity, 
pushing the system far away from the hxed point of no activity (and 
hope that it will be attracted by another hxed point where activity is 
not zero). 

(ii) We compromise the stability condition by either manipulating: 

(a) the network structure (i.e., making ki larger) or 

(b) the activity dynamics (i.e., making A//i smaller). 

Structurally, we can manipulate the size of ki by creating or remov¬ 
ing links (and nodes) in our network (for more information on how to 
manipulate Ki see [65]). Dynamically, A//i becomes smaller if either A 
becomes smaller, meaning that the intrinsic user activity decays at a 
slower pace or p becomes larger, meaning that people copy their friends 
more and faster, or both. 
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2.3 Discussion on Parameter Evolution 

At this time, we leave the investigation of the manipulation of the activity 
dynamics ratio A//i as well as the manipulation of the network structure to 
invalidate the master stability equation open for future work. Nevertheless, 
before illustrating how our proposed activity dynamics model can be applied 
to empirical datasets, we discuss potential system evolution scenarios and 
their implications for activity. 

Activity Decay Rate. Technically, if A increases, the ratio X/y increases 
as well, resulting in higher (faster) losses of activity per timespan. Once the 
system satishes the master stability equation [ki < X/y) it will inevitably 
become inactive. To be precise, the larger A for a stable system, the faster 
activity will converge towards zero. Essentially, an increase in A represents an 
increased intrinsic loss of activity for all users (e.g., due to a lack of interest to 
contribute) while a decrease of A can be interpreted as an increase of interest 
(more precisely, slower loss of interest) and thus higher levels of activity. 
Evolution scenarios of Activity Decay Rate. We would expect to see an 
increase in A on websites with low levels of user interaction and activity 
(i.e., meaning that individual contributions are not valued, as no feedback 
is provided). On the other hand, websites that engage with their users and 
provide steady updates (e.g., new content or functionality) will likely see a 
consistent or even decreasing A. In general, practitioners can influence A 
by, for example, providing incentives for users to contribute, such as badges, 
barn stars, likes, reputation systems, or monetary incentives. 

Peer Influence Growth Rate. With increasing values for y the ratio 
X/y decreases, resulting either (i) in an overall increase in activity if the 
system is unstable > X/y), (ii) in prolonged timespans of activity before 
converging towards inactivity if the system is stable {ki < X/y), (hi) or in an 
invalidation of the master stability equation A X/y reaches a tipping point 
where Ki > X/y. 

The evolution of y directly corresponds to the evolution of the Maximum 
Peer Activity Flow and Critical Activity Threshold. 

Maximum Peer Activity Flow. The parameter q dehnes the maximum amount 
of activity (peer influence) that can traverse along the edges of the collabora¬ 
tion network per unit time. If this parameter increases, y = q/oc will increase 
as well; resulting in an overall increase in activity. In contrast, reducing the 
value of q results in overall decreasing levels of activity. 
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Evolution of Parameters 
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(a) Evolution of Critical Activity Threshold 


(b) Coupled Evolution of Parameters 


Figure 5: Parameter Evolution Scenarios. In a system with (at first) 
increasing overall levels of activity and fixed values for q and A for all users, 
we expect Oc to slowly increase (see (a)), as individual contributions are 
indistinguishable due to a flood of newly added content (activity). As a 
consequence, more posts and replies are required from all users to exercise 
the same amount of peer influence—represented by increasing values for Oc 
over time. After a certain point in time, Oc will reach a threshold and activity 
will start to decrease, if not intervened by administrators. In a more realistic 
scenario (see (b)), again with increasing levels of overall activity, users will— 
in addition to increasing values of —start to lose interest in contributing 
to the system, represented by increasing values for A. As a consequence, 
activity will decrease at a faster pace. 


Evolution scenarios of Maximum Peer Activity Flow. In real-world systems, 
q is best interpreted as a proxy for the efficiency of the user interface, describ¬ 
ing how well information (or influence) is transported (e.g., highlighted or 
visualized) across users. For example, practitioners can influence the Maxi¬ 
mum Peer Activity Flow by adding recommendations for users to collaborate 
with or by optimizing the presentation of newly added/edited content. Note 
that with increasing numbers of users and levels of activity it becomes in¬ 
creasingly difficult for practitioners to keep q at its current level, let alone 
positively influence the parameter due to the vast amount of content and/or 
activity present in the system. 


Critical Activity Threshold. The parameter Oc represents a soft threshold, 
which dehnes when users start to “effectively notice” the actions of their 
peers and are, as a consequence, “notably” influenced (see Figure 2(b)) by 

(i.e., posts or replies) are required 


them. The larger Oc, the more actions 


17 





























by users to positively influence their peers to copy their actions and increase 
their activity levels (see Figure]^. 

Evolution scenarios of Critical Activity Threshold. In practice, we would 
expect to see an increasing Oc with an increasing number of active users 
and levels of activity. For example, in a system with low activity and a 
small number of users, each action by a particular user will be noticed im¬ 
mediately by all others—meaning that the level of Oc is low. However, with 
increasing numbers of users and an increase in activity, users have to increase 
their number of posts and replies to be noticed by their peers. Hence, the 
more active users are present in a system, the harder it becomes for users to 
specihcally notice each contribution of their peers individually. In a worst 
case, users are confronted with an activity overload that might even result in 
decreasing levels of (positive) peer influence. In particular, an initial increase 
in activity likely leads to an increase in Oc, which in turn decreases activity 
in the system. Thus, evolution of Uc represents a negative feedback loop in 
the system. In contrast to q, which serves as a proxy for the user-interface, 
Uc represents an intrinsic parameter of the users of a system. Administra¬ 
tors of such networks and websites can influence Uc by either influencing q 
(e.g., by adjusting the user interface to better promote each individual action 
taken by the peers of a user) or by actively avoiding and counteracting the 
activity overflow by hltering and reducing the amount of new content that is 
displayed at once. 

For example, the mechanisms of how Facebook displays posts in its “News 
Feed” can be seen as a measure to hlter and limit newly added content; ac¬ 
tively avoiding information or activity overloads while maximizing the (peer) 
influence of each individual contribution. 

Summary of evolution scenarios. If activity increases over time and no 
adaptations to the system are implemented, activity will inevitably decrease, 
due to a larger Critical Activity Threshold (see Figure]^. To counteract this 
development, website administrator could either try to manipulate Activity 
Decay Rate —an intrinsic property that varies per user—or optimize the user 
interface, and thus manipulate Maximum Peer Activity Flow. 


3 Empirical Illustration 

We are now interested in modeling and simulating activity dynamics for 
empirical datasets. In particular, we investigate activity dynamics for an 
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(a) History 
StackExchange 


(b) Bitcoin (c) English Language (d) Mathematics 
StackExchange & Use StackExchange 

StackExchange 



(e) Beachapedia 
Wiki 


(f) Nobbz Wiki 


(g) NeuroLex Wiki (h) 15Mpedia Wiki 


Figure 6: Degree Distribution of Empirical Collaboration Networks. 

Visualization of the degree distribution of all investigated collaboration net¬ 
works. The top row (a to d) depicts the different StackExchange collab¬ 
oration networks, while the bottom row (e to h) shows the collaboration 
network visualizations for the different Semantic MediaWiki instances. The 
majority of users, across all collaboration networks, exhibits between 0 and 
10 collaboration edges. 


array of different websites, consisting of instances of the StackExchang^ 
network as well as multiple Semantic MediaWiki^ 

First, we characterize the investigated datasets and outline our methods 
for the empirical estimation of the required parameters (see Table [^. We 
then £t our model to the collaboration networks and present the results of 
the activity dynamics simulation. 

^http://www.stackexchange.org/sites 
^http://www.semantic-mediawiki.org 


19 








































3.1 Datasets 


We selected a total of four differently sized instances from the StackExchange 
network as well as four different Semantic MediaWiki instances to model ac¬ 
tivity dynamics. In particular, we concentrate our efforts on the History 
StackExchang^ (HSE), which is the smallest of the StackExchange datasets 
and allows users to discuss topics and questions related to history and histor¬ 
ical events. The Bitcoin StackExchang^ (BSE) as well as the The English 
Language & Usage StackExchang^ (ESE) represent two medium-sized web¬ 
sites and are platforms for asking and discussing questions related to every¬ 
thing related to mining, buying and selling of bitcoins and the English lan¬ 
guage respectively. On the Mathematics StackExchang^ (MATHSE) web¬ 
site, which also represents our largest dataset, users can ask and discuss 
mathematics related questions and topics. 

We further investigate activity dynamics for the Beachapedia WikQ(BP), 
representing the smallest dataset in our activity dynamics analysis, striving 
to create a structured knowledge base for a variety of topics on beaches 
in the United States. The medium-sized german Nobbz Wikj^ (NZ) pro¬ 
vides a structured knowledge base and discussion platform for the online 
game “Die Verdammten”|^ The second largest dataset, the NeuroLex Wikf^ 
(NLX), represents a large and semantically enriched lexicon on terms and 
topics related to neuroscience. Our largest dataset is the ISMpedia Wikf^ 
(15MW)—a Spanish Semantic MediaWiki instance that discusses a wide va¬ 
riety of topics related to Spain and its different areas and regions. 

In general, the investigated datasets are very diverse in their characteris¬ 
tics, for example, the number of active users ranges from 35,476 in MATHSE 
to a total of 16 in BP. For the analyses conducted in this paper we focus on 
the last 52 weeks of each dataset. For more detailed information see Table IH 
The different degree distributions for all collaboration networks are highly 
heterogeneous (cf. Figure [^. For all investigated datasets, the majority of 


^http 

^http 

“http 

®http 

//history.stackexchange.com 
//bitcoin.stackexchange.com 
//english.stackexchange.com 
//mathematics.stackexchange 

. com 

^http 

//www.beachapedia.org 


^http 

//nobbz.de/wiki 


®http 

//www.dieverdammten.de/ 


^http 

//neurolex.org/ 


^^http 

//wiki.15m.cc/wiki/Portada 
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users exhibit between 0 and 10 collaboration edges. However, in all datasets 
there are a few users with a large number of collaboration edges. 

From each of these datasets we extracted a collaboration network for the 
tasks of htting the model and simulating activity dynamics. Hence, we hrst 
parsed the change-logs of all datasets. Each user, who has contributed at 
least one question, answer or comment for the StackExchange datasets, or 
created or edited an article for the Semantic MediaWikis is represented as 
a node in the corresponding collaboration network. Edges between users 
represent collaboration and are undirected. For the StackExchange datasets, 
we dehned collaboration as either posting an answer to a question or posting 
a comment on the initial question or an answer. For the Semantic MediaWiki 
instances, we have created an edge between users who (chronologically and) 
successively changed the same article (cf. Figure [^. Edges with the same 
source and target user have been removed in all datasets. 

Further, users with zero collaboration edges are initialized analogously to 
all other users and are not specihcally hltered from our datasets. However, 
due to missing positive peer influence, activity will inevitably—as long as 
X/fi > 0—converge towards zero for these users. 

Note that the presented approach for creating collaboration networks rep¬ 
resents just one of many different possibilities to create such networks and 
is analogous to (undirected) co-authorship networks as presented in New¬ 
man [BB] ; Barabasi et al. ng. Given that the created collaboration networks 
are based on interactions between users, we assume similar characteristics to 
social networks, particularly with regards to potential peer influence [6]. 

3.2 Parameter Estimation with Least-Squares 

To estimate A//i for (preprocessed) empirical datasets we resort to an output- 
error estimation method. First, we formulate the estimation of the model 
parameter as an optimization problem. As objective function we use a well- 
known least-squares cost function. Second, we solve the optimization prob¬ 
lem numerically, using the method of gradient descent in combination with 
Newton’s method to speed up the calculations. Finally (as a proof of con¬ 
cept), we evaluate the accuracy of the ratio estimate by calculating prediction 
errors on unseen data. Next, we describe these estimation steps in more de¬ 
tails. 

Preprocessing. First, we aggregate all activities per user per day and 
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Figure 7: Collaboration Network Construction. This plot depicts the 
different elements of the StackExchange and Semantic MediaWiki datasets 
that have been classified as posts and replies (cf. Table as well as the 
edges that have been drawn between certain entities and change-actions and 
represent collaboration in our collaboration networks. 


Table 2: Dataset statistics. Note that all datasets differ in the number of 
users, collaboration edges and activity. Users refers to the number of unique 
users that have contributed more than one post or reply to the corresponding 
datasets within our observation periods. Posts represent newly created ques¬ 
tions in the case of the StackExchange network and newly created articles in 
the case of the Semantic MediaWiki datasets. Replies are either comments 
or answers for all StackExchange datasets and edits of existing articles for 
Semantic MediaWikis. ki denotes the largest eigenvalue of the correspond¬ 
ing collaboration network. For our experiments we limited our observation 
periods to the last 52-1-3 weeks of each dataset. 


Dataset 

HSE 

BSE 

ESE 

MATHSE 

BP 

NZ 

NLX 

15MW 

Users 

682 

1,299 

7,893 

35,476 

16 

36 

112 

394 

Edges 

5, 179 

5,528 

83, 457 

477, 133 

38 

125 

383 

772 


54.33 

43.88 

162.04 

303.58 

6.71 

11.46 

18.4 

19.97 

Posts Sz Replies 

12,496 

12,295 

151, 028 

986,996 

2, 718 

603 

33,792 

102,521 

Weeks 

52 + 3 

52 + 3 

52 + 3 

52 + 3 

52 + 3 

52 + 3 

52 + 3 

52 + 3 
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apply a rolling mean of 7 days to smoothen and rednce strong fluctuations in 
activity, which are likely caused by external influences. Second, we further 
aggregate the smoothed activities per user and per (calendar) week. For 
an additional noise reduction in our datasets we remove all users that have 
contributed less than one post or reply in the smoothened dataset during 
our observation period, as well as the hrst and last week of our datasets, if 
they contain less than 7 days of activity data. Finally, since we only want 
to illustrate the practical application of our model on the empirical data 
we extract the last 52 + 3 to weeks from all our datasets. Note that the 3 
additional weeks are required to calculate a ratio for the simulation of activity 
for the first week. 


Formulating estimation as an optimization problem. Depending on 
a particular application of the model we may need to introduce a suitable 
objective function. For example, we may be interested in applying our model 
to analyze and simulate the aggregated levels of activity in a system. In other 
words, we are interested in the overall activity level in a system, rather than 
in the particular activity distribution over the users (see below for another 
example involving user activity levels). Hence, we formulate the objective 
function (see Equation 21) as a least squares cost function, which calculates 
the error of the sum of activity over multiple data points over a certain period 
of time T : 


/i 


1 

T 


E 


n n ^ 

'^Xiik+ 1)+ 1) , 

i 2 . 


( 21 ) 


where Xi{k) is the empirically observed activity of user i at time fc, Xi{k) 
is the estimated activity for user i at time fc, and n is the total number of 
users as before. 

To calculate the estimates Xi{k) we numerically integrate the differential 
equations from our model by applying Euler’s method for solving differential 
equations computationally. Thus, we approximate the time evolution of Xi 
between all time steps k and k + 1 (for each of these steps we set the total 
time to r) by iterating: 


Xi^t+iik) = Xi^tik) + Ar 
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(^) + 

j 
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( 22 ) 


where we set Xi^t=o{k) = Xi{k), Vi, k and use the current estimate for A//i to 
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( 23 ) 


perform calculations. The final equation for Xi{k + 1) becomes: 


t=T 

Xi{k + 1) = Xi{k) + Ar ^ 

t=o 


A. 

/i 


(^) + 

j 


a /1 + %, t ( fc )2 


The local approximation error for the Euler’s method is of the order 
O(Ar^) and the global of the order 0(Ar). To perform integration between 
steps k and fc + 1 we need to iterate for t jIS .t steps, where Ar needs to 
be chosen with care. In general, if we set Ar too high—meaning that the 
calculations are less computationally intensive, as we have to run a smaller 
number of iterations—the accuracy of our simulation (including the estima¬ 
tion of the ratio) will decline, as the potential error per iteration due to our 
approximations becomes higher. This error can become so large that it could 
potentially lead to numerical instability, meaning that the overall activity in 
a system can become negative, which might result in activity to diverge to¬ 
wards ±oo. With certain combinations of the network structure, Ar and the 
calculated ratios, activity can become negative without diverging, oscillating 
around the hxed point of zero activity until convergence. In contrast, if we 
set Ar too low we end up with a very precise simulation, although the time 
necessary to compute the simulation will be much higher, as a much larger 
number of iterations will have to be executed. 


Numerical solution of the optimization problem. We solve the op¬ 
timization problem numerically using the method of gradient descent. The 
hrst derivative of the objective function (Equation 21) dehnes the update 
rule or gradient, which directs if and to what extent we have to increase 
or decrease A/p to minimize the error of the sum of activities over all data 
points during T. 

Once we calculate the hrst derivative with the current values of estimated 
activities we update the ratio by multiplying the derivative with the learning 
rate rj. Thus, the complete procedure is as follows. First, we initialize our 
estimation by using ki for the hrst simulation. Second, we estimate the 
activities and calculate the gradient with these estimates. Third, we calculate 
the error between our simulated and empirical values, and adapt the ratio 
according to the corresponding update function and step size rj. Fourth, 
we repeat this process until the calculated update for the ratio is smaller 
than a given convergence criterion (e.g., 10“^^) or if we reach a total of 
20, 000 iterations without reaching convergence. Additionally, we have also 
implemented Newton’s method, which in our cases substantially reduces the 
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Figure 8: Illustrations with Synthetic Data. The plots depict the results 
of the activity dynamics simulations for Zachary’s Karate Club network with 
synthetic activity values (left y-axes) and the corresponding ratios (right y- 
axes). The black solid lines with x markers represent the simulated activity 
over t (in weeks; x-axes). The solid gray lines with circles represent syn¬ 
thetic activities; the gray dotted lines with diamonds represent the ratios 
corresponding to the simulated activities. With increasing and decreasing 
activities, the ratios become smaller (see (a)) and larger (see (b)). When 
setting activity randomly (see (c)) the ratio adjusts analogously. 


computation time. In all our experiments we set T to four weeks, meaning 
that we optimize the objective function by calculating the optimal ratio over 
a span of four data points (weeks). 

Evaluation of the parameter estimates. We evaluate the accuracy of 
the estimated parameters by cross-validation (leave-one-out method). In 
particular, we use the estimated ratios over 4 weeks to simulate activity for 
the succeeding week. For example, we calculate the optimal X/y (according 
to our objective function) for weeks 1-4 and predict activity for week 
5. Next, we use the empirical data of weeks 2 - 5 to calculate the ratio 
to predict activity for week 6. Hence, we calculate a total of 52 ratios to 
simulate activity for a total of 52 weeks. 

As depicted in Figure we have created three synthetic scenarios to test 
and illustrate the mechanisms of the Activity Dynamics Model. First, we 
estimate A//i (right y-axes; gray dotted lines with diamonds) for the three 
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scenarios with synthetically created increasing, decreasing and variable or 
random activities (left y-axes; gray solid lines with circles) over 10 + 3 weeks 
(x-axes). In all three scenarios we use Zachary’s Karate Club as the un¬ 
derlying collaboration network. Due to our parameter estimation process 
the simulated levels of activity (left y-axes] black solid lines with x markers) 
exhibit a small lag when activity steadily moves into one direction (i.e., in¬ 
creases or decreases). On the other hand, small fluctuations (see weeks 6 ~ 
9 in Figure 8(c)) are mitigated. The ratios (right y-axes), which correspond 
to the simulated levels of activity in the same week, are depicted as well. 


Discussion on parameter estimation method. To validate the correct¬ 
ness of our implementation of the method of least squares, we have simulated 
activity for datasets with a preset ratio (and random weights for initializa¬ 
tion) for 3 weeks. We then used the random activity initialization values, as 
well as the activity values for each of the 3 weeks as input for the calculation 
of the ratio with the method of least squares. Using this approach, we were 
able to estimate previously set ratios with negligibly small errors. When 
adding noise to the simulated activity values, the obtained ratios were less 
accurate accordingly. 

Note that the estimation and validation method that we apply is only one 
of many possible methods. In this paper, we want to illustrate the general 
applicability of our method as well as its potential to gather new insights 
into the intricate dynamics of activity in online collaboration networks. We 
measure the accuracy of the prediction only as a general proof of concept 
of our model and leave further investigations of the predictive power of our 
method open for the future work. Following up on this notion, we now shortly 
discuss some alternative approaches for formulating the objective function 
and their implications. 


Alternative objective functions. To demonstrate the versatility of our 
model—if we are interested in answering questions about the distribution of 
the activities over users—we may change the formulation of the objective 
function to calculate ratios that minimize the error of activity per user and 
per data point (see Equation 24). Note that when optimizing towards ag¬ 
gregated levels of activity, we obtain ratios that characterize the systems. In 
contrast, with the adapted objective function, we are interested in learning 
more about the users of such systems. The alternative objective function 
may be dehned as follows: 
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( 24 ) 


^ ^ fc =0 

where x and x are now n-dimensional vectors storing the activities of all n 
users. Thus, this objective function represents the sum of squared errors 
calculated for each of the n users of the corresponding systems over a total 
of T data points. 

We have estimated A//i and simulated activity for HSE using this ob¬ 
jective function. In contrast to the aggregated levels of activity, we obtain 
a more accurate distribution of activities across all users, as was intended. 
However, each of the 4 data points in T now corresponds to a vector of n 
users, as opposed to a single value (the aggregated activities), resulting in 
either much higher computation times, a larger error for the prediction tasks 
or both. 

Additionally, to tackle the prediction problem and to avoid overhtting we 
may introduce a regularization term to the objective function. For example, 
we might be interested in keeping the ratio or the difference between the ratio 
and Ki small. In the latter case we would add a term such as 7(^1 — A//i)^ 
to our objective function, where 7 represents the strength of regularization. 

We leave a detailed analysis and comparison of different objective func¬ 
tions open for future work. The ratios calculated to minimize the error for 
aggregated activity levels exhibit higher accuracy in our simulations (in terms 
of overall activity per month). The trade-off for a more accurate distribution 
of activities over users with the changed objective function are worse results 
for the simulation of activity, as not only the aggregated activity levels are 
considered, but the vector of activities of all user in our datasets over multi¬ 
ple points in time. However, these ratios provide a better overall correlation 
between simulated and empirical activities per contributor of our system. 

3.3 Illustration on Empirical Datasets 

After calculating A//i and setting Ar we simulate activity in our collabo¬ 
ration networks. Due to our chosen approximations, the main goal of the 
presented illustration is not to predict activity in collaboration networks. 
Rather, we are interested in demonstrating that our assumptions regarding 
the Aetivity Deeay Rate and the Peer Influenee Growth Rate hold and allow 
us to simulate trends in activity dynamics for given and real values. Further, 
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Figure 9: Results for the activity dynamics simulation. The plot 
depicts the results of our activity dynamics simulation for the StackExchange 
datasets (top row) and Semantic MediaWiki instances (bottom row). The 
solid gray lines with circles represent the empirical (observed) activity over t 
(in weeks; x-axes), while the solid black lines represent the simulated activity 
dynamics (y-axes). In all of our analyzed datasets, the simulated activity 
dynamics exhibit a notable resemblance to the empirical activity. 


by modeling and simulating activity dynamics for empirical datasets we not 
only deepen our understanding of the model but we also—depending on the 
values of the parameters—potentially obtain new insights into the systems 
under investigation. 

Figure [^depicts the results of the activity dynamics simulation. The root 
mean-squared errors (RMSEs) of the simulations are listed in Table 
















Overall, the results gathered from the activity dynamics simulation ex¬ 
hibit a notable resemblance to the real activities of the corresponding datasets. 
Due to the chosen approximations and simplihcations when estimating A//i 
for our model (i.e., static network structure and average model parameters 
over weeks and users), the simulated activity is naturally limited in its accu¬ 
racy. These limitations are particularly visible whenever there are large and 
sudden increases of activity in the collaboration networks. Note that A//i 
will only be higher than ki if activity in our datasets is either zero or the 
relative difference in activity between two months is extremely high, which 
is never the case for our smoothed empirical datasets. 

Further, the assumption of a hxed network structure of our investigated 
collaboration networks also (negatively) influences the obtained results of 
our simulation. For example, it is possible for our simulation to yield higher 
increases in activity (e.g.. Figure [9(b)] ), as users might be influenced by peers, 
who would join the collaboration network only at a later point in time. 


4 System Mass and Activity Momentum 

We can further analyze the obtained ratios and parameters of our activity 
dynamics simulation to broaden our understanding of the collaboration net¬ 
works under investigation. Figure [^depicts the value of the calculated ratios 
X/fi (?/-axis) for each week (x-axis). If the ratio is higher than ki (denoted in 
the title of each Figure), our master stability equation holds and the system 
converges towards zero activity (over time). The amount of activity that is 
lost per iteration—and hence the speed of activity loss—is proportional to 
the value of the ratio and the activity already present in the network. In 
general, a higher ratio results in a higher and faster loss of activity. 


Table 3; RMSE. The table depicts root mean-squared errors (RAISE) of 
our activity dynamics simulation per user and week for all datasets. Our 
simulation yields a small RAISE for all StackExchange datasets. RAISE for 
the Semantic AlediaWiki datasets is slightly higher, which is likely due to 
the lower number of active users (listed in the Users column). 


Dataset 

HSE 

BSE 

ESE 

MATHSE 

BP 

NZ 

NLX 

15MW 

Activity 

12, 496 

12,295 

151, 028 

986,996 

2, 718 

603 

33, 792 

102,521 

Users 

682 

1, 299 

7, 893 

35,476 

16 

36 

112 

394 

RMSE 

0.076 

0.031 

0.029 

0.030 

1.755 

0.274 

4.397 

4.043 
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Figure 10: Evolution of ratios A//u. The evolution of the ratios A//i (y- 
axes) over r (in weeks; x-axes) for the StackExchange datasets (top row) 
and for the Semantic MediaWiki instances (bottom row). The smaller the 
ratio, the higher the levels of activity in Figure]^ Small variances in A//i 
over time indicate that activities of the systems are less influenced by the 
activity of single individuals than they are by peer influence. 


If the ratio is smaller than ki, the master stability equation has been 
invalidated and the system will converge towards a new fixed point of im¬ 
manent activity (cf. Section 2.2). If this is the case, we can observe one of 


three potential behaviors, which are triggered depending on the amount of 
activity already present in the network and the current ratio: 

(i) An increase in activity if the new fixed point, corresponding to 
the new ratio, is of higher overall activity than the activity already 


present in the collaboration network (see r = 20 — 30 in Figures 9(d) 


and 10(d)[ ). This situation emerges whenever we invalidate the master 
stability equation from a previously stable fixed point or if the system 
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is already stable in a situation when the new ratio is smaller than the 
last estimated ratio. 


(hi) 


A decrease in activity if the new fixed point is of lower overall 
activity than the activity already present in the collaboration network 
(see r 3 — 7 in Figures 9(b) and 10(b)| ). Again this may occur in 
two specihc situations. First, if the ratio increases, so that the master 
stability equation is now satished and the system has been previously 
in an unstable state. Second, if the system is in an unstable state but 
the ratio increases slightly without satisfying the stability equation. 

No change in activity if the new hxed point corresponding to the new 
ratio is of the same overall activity than the activity already present in 
the collaboration network (see r 20 — 30 in Figures 9(b) [and 10(b)). 


System Mass. We can now use the obtained ratios to characterize the 
collaboration networks and quantify their robustness in terms of their activity 
dynamics. Robust systems are systems with lively and high levels of activity, 
which are able to keep that activity even in the cases of small unfavorable 
changes in the dynamical parameters. Less robust systems are systems that 
lose their activity very quickly as a consequence of even small changes in 
the ratio. Thus, we calculate the standard deviation over all ratios a\i^ over 
time and normalize it over ki —to account for the size of the collaboration 
networks—and refer to it as p —the normalized standard deviation of the 
ratio A//i (see Equation [^. 


Ki 

The normalized standard deviation is a measure of system sensitivity and 
its inverse (1/p) represents a measure of system stability or inertia to changes 
in activity. Analogously to mass in classical mechanics—which defines the 
inertia or resistance of being accelerated or decelerated for an object by a 
given force—we call the quantity 1/p the System Mass. We denote this 
quantity with rus with the subscript s to distinguish it from the number of 
links m in a collaboration network (see Table |^. In systems with a large 
System Mass it is more difficult to induce changes in activity. In particular, 
this means that it is more difficult to reduce activity in a consistently active 
system (due to the small standard deviations of A/p), as well as it is difficult 
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to jump-start the same system if activity levels were consistently low in the 
past (again, due to small standard deviations of A//i). 

Activity Momentum. After calculating the System Mass rris, we are now 
interested (again analogously to classical mechanics) in calculating the Ac¬ 
tivity Momentum p for our collaboration networks (see Equation [2^ . 


p = mga (26) 

For activity we take (i) the average activity (posts and replies) per week and 
(ii) the activity in the last month of our observation periods (cf. Table 
and calculate (i) the average and (ii) the current momentum. 

The higher the Activity Momentum of a collaboration network, the more 
force is needed to “stop” (make it inactive) the system. Hence, the higher the 
momentum, the more robust a given network. In particular, if a (sufficiently) 
small number of users would suddenly stop contributing to a collaboration 
network that exhibits a very large Activity Momentum p, activity in the 
overall network would be minimally influenced. On the other hand, if the 
same number of users would stop contributing to a collaboration network 
with a (signihcantly) smaller Activity Momentum p, chances are that their 
actions (or lack thereof) will have a notable influence on the overall trends in 
activity dynamics of the system. In particular, there are three factors that 


Table 4: System Mass and Activity Momentum. The table depicts the 
results for the activity momentum analysis, p is the standard deviation of 
the calculated ratios normalized over ni. System Mass is represented by 1/p 
and Activity Momentum represents System Mass multiplied with Activity. 
Activity depicts the average activity per week as well as the value for the 
last observed months in brackets. Activity Momentum follows analogously. 
MATHSE and ESE exhibit the largest average and current Activity Momenti, 
followed by 15MW and NLX. Even though 15MW exhibits a System Mass 
similar to HSE and NZ, its Activity Momentum is much larger. 


Dataset 

Activity (last month) 

P 

System Mass 

Activity Momentum (last month) 

MATHSE 

19,255 (70, 130) 

0.0115 

86.65 

1,674,415 (6,076,765) 

ESE 

2,952 (13,751) 

0.0344 

29.07 

85,815 (399,742) 

BSE 

246 (782) 

0.0762 

13.12 

3, 228 (10,260) 

HSE 

248 (1, 110) 

0.0554 

18.10 

4,489 (20,091) 

15MW 

1,999 (4,702) 

0.0506 

19.76 

39, 500 (92, 912) 

NLX 

668 (1, 131) 

0.0532 

18.80 

12.558 (21, 263) 

NZ 

12 (270) 

0.0802 

12.67 

152 (3,421) 

BP 

54 (228) 

0.0547 

18.28 

987 (4,168) 
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influence the Activity Momentum of collaboration networks: 

(i) The standard deviation of \/ja. If the ratio is very stable and does not 
frequently oscillate, the standard deviation and hence the normalized 
standard deviation will be very small. This also means that activity, 
as well as increases and decreases thereof, is equally distributed across 
T and is not (frequently) exercised in bursts. 

(ii) The largest eigenvalue Hi. Larger and denser collaboration networks 
exhibit a larger highest eigenvalue Ki. As p is the normalized variance 
of the ratios over ki, the largest eigenvalue will directly influence p. 
The notion of normalizing p over ki follows the intuition that that 
large collaboration networks are less likely to exhibit sudden changes 
in activity than smaller ones. 

(hi) The activity. The larger the average activity (posts and replies) per 
month, the higher the Activity Momentum of a collaboration network, 
and hence the higher the force that is needed to render the collaboration 
network inactive. Analogously, networks with a small Activity Momen¬ 
tum require less force to be influenced (i.e., to either speed up/increase 
or slow down/decrease activity). 

Hence, we can use the calculated Activity Momentum p as an indicator 
of the activity level as well as the tendency of a system to stay at that 
activity level in the future. For example, MATHSE exhibits the most robust 
collaboration network of our datasets regarding changes in activity, with an 
Activity Momentum of order 10® (average per week and last month). ESE 
and 15MW both exhibit similar average Activity Momenti of orders 10^. 
However, when looking at the Activity Momenti of the last months, ESE is 
roughly four times as hard to stop as 15MW. 

In contrast, HSE and BSE exhibits very similar activity levels for last 
month, however the corresponding Activity Momentum of HSE is twice the 
one of BSE, indicating that half the force is needed to render BSE inactive 
than it would be needed to render HSE inactive. The other datasets follow 
analogously. 

On the other hand, BP exhibits a high value for System Mass and a 
very low corresponding Activity Momentum^ indicating that it will be very 
difficult to to accelerate or jump-start the system with regards to activity. 
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5 Related Work 


The work presented in this paper was inspired by and builds upon work 
presented in the areas of critical mass theory and dynamical systems on 
networks. 

5.1 Critical Mass Theory 

In 1985 and 1988, Oliver et ah [68]; Oliver and Marwell [69]; Marwell et ah 
[56] have discussed and analyzed the concept of critical mass theory by in¬ 
troducing so called production functions to characterize decisions made by 
groups or small collectives. Fundamentally, these production functions rep¬ 
resent the link between individual benehts and benehts for the group. 

They argue that one very important aspect of critical mass is the natural 
limitation of collective goods for groups such as housing, food, fuel or oil. 
Hence, the capacity of users (and thus critical mass) for such a group or sys¬ 
tem is naturally limited by the corresponding resource. However, collective 
(digital) goods are not (or only artihcially) limited for online communities; 
theoretically allowing for an inhnite increase in users and interest. With¬ 
out users motivated to contribute, interest will decrease and critical mass 
will lose momentum and ultimately decelerate until all interest vanishes. In 
their work they identihed multiple different types of production functions, 
with the most important ones being: Accelerating, decelerating and linear 
functions. The idea behind accelerating production functions is that each 
contribution is worth more than its preceding one. In a decelerating produc¬ 
tion function the opposite would be the case, resulting in each succeeding 
contribution to be worth less than the preceding one, while contributions to 
linearly growing functions are always worth the same. Until today it is still 
mostly unclear what these production functions look like for online commu¬ 
nities (e.g., StackOverflow) and online production systems (e.g.. Semantic 
MediaWikis). 

Depending on the investigated or desired point of view, different char¬ 
acteristics of these communities and online production systems can be used 
as basis for calculating production functions. The analysis of Oliver et ah 
[68] also highlights that different production functions can lead to very dif¬ 
ferent outcomes in similar situations. For example, given an accelerating 
production function, users who contribute to a system are likely to hnd their 
potential contribution “prohtable”, as each subsequent contribution increases 
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the value of their own contribution. Naturally, this increases the incentive 
to make larger contributions to begin with. Given a deceleration production 
function, users would not immediately see the beneht of large contributions, 
given that each subsequent contribution is increasing the overall value less, 
while more effort, in the form of larger contributions, is needed to turn a 
decelerating production function into an accelerating one. 

One approximation for critical mass by Solomon and Wash ua involved 
the investigation of the number of changes ~ as activity - and number of 
users - as growth of a community ~ for calculating production functions for 
WikiProjects. The authors argue that activity in online production systems, 
after certain amounts of time, is the best indicator of a self-sustaining system. 
In this work, we have extended the analysis presented by Solomon and Wash 
and specihcally dehne the point of when an online system has reached critical 
mass and has become self-sustaining in terms of its activity dynamics. Walk 
and Strohmaier |87] recently conducted a similar analysis to characterize 
critical mass for Semantic MediaWikis. 

Raban et ah pa investigated factors that allow for a prediction of survival 
rates for IRC channels and identihed the production function of these chat 
channels regarding the nnmber of unique users versus the number of messages 
posted at certain times, as the best predictor. 

Cheng and Bernstein [22] have analyzed concepts of activation thresholds, 
which resemble features that, when achieved, can help to reach and sustain 
self-sustainability. They created an online platform that allow groups to 
pitch ideas, which only will be activated if enough people commit to it. 

With regards to activity, Suh et ah |8l] have shown that contributions 
to Wikipedia are slowing down, which is likely a direct consequence of the 
increase in required coordination activities, as well as comprehensive contri- 
bntion guidelines which discourage posts by users. Kittur and Kraut [IH] have 
demonstrated that when reducing the overhead for editors—effectively mini¬ 
mizing the efforts necessary to contribute to Wikipedia—can help to increase 
the number of contributions and article quality. Similarly, Anderson et ah 
[3] investigated the value and development of contributions to the qnestion 
answering portal StackOverflow. In contrast, Yang et ah [92] have investi¬ 
gated the evolution of two different types of users in StackOverflow, namely 
sparrows (very active users) and owls (experts) in the discussed topics, and 
could identify various differences between the two user-groups. 

We use the notion of critical mass to dehne the barrier, that has to be 
overcome, for collaboration networks to become self-sustaining in terms of 
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activity. 


5.2 Dynamical Systems on Networks 

Dynamical systems in a non-network context are a well-studied scientific and 
engineering field. Generally, a dynamical system is any system that changes 
in time, whose behavior is determined by some specific rules or (differential) 
equations over a set of quantifiable variables. We distinguish between contin¬ 
uous and discrete as well as deterministic and stochastic systems. Strogatz 
[HD] and Barrat et ah [12] provide excellent introductions and analyses of 
dynamical systems. 

Different social and economic processes, which take place both offline 
and online, have been modeled with the use of dynamical systems. In the 
context of the Web, the primary focus of dynamical systems was set on 
analyzing and understanding the diffusion of information in online social 
networks [HB 1S21 EH ES], including the analysis of online memes and viral 
marketing. 

On the other hand, the Bass Model na describes how novel products 
are accepted and adopted in a network and has seen a wide variety of appli¬ 
cations in different fields of research and also for practical use. The model 
consists of two parameters, the propensity for innovation and the propensity 
for imitation. A product will be successfully accepted and adopted by the 
community, depending in the ratio between these two parameters. 

Acerbi et ah [2] investigated factors that determine how social traits prop¬ 
agate within a specific popularity. Iribarren and Moro [12] conducted a viral 
email experiment, allowing them to track the diffusion of information in a 
social network. They showed that due to heterogeneity in human activity, 
the most common and simple growth equation from epidemic models is not 
suitable to model information diffusion in social networks. 

Recently, in the context of activity dynamics, Ribeiro [76] conducted an 
analysis of the daily number of active users that visit specific websites, fitting 
a model that allows to predict if a website has reached self-sustainability, 
defined by the shape of the curve of the daily number of active users over time. 
He uses two constants a and (3, where a represents the constant rate of active 
members influencing inactive members to become active. (3 describes the rate 
of an active member spontaneously becoming inactive. Whenever /?/a > 1 a 
website is unsustainable and without intervention the daily number of active 
users will converge to zero. If /d/a < 1 and the number of daily active 
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users is initially higher than the asymptotic one, a website is categorized as 
self-sustaining. 

The model presented in this paper to simulate activity dynamics heavily 
relies on the concept of dynamical systems on networks. We strongly believe 
that by modeling and understanding activity dynamics, we will gain a better 
understanding of the processes involved in and around the concept of peer 
influence in collaboration networks. Other areas of application for dynamical 
systems on networks are the modeling and simulation of diseases in the form 
of epidemic models, and opinions or traits of a person, also known as opinion 
dynamics. 

5.2.1 Epidemic Models 

Modeling the outbreak of diseases can be seen as a special case of dynami¬ 
cal systems. At first, epidemic models dealt with the spreading of diseases 
in social (real life) networks [STj [38l SI SSI SZl EH SSJ El] , ignoring the un¬ 
derlying network aspect, simulating contractions and outbreaks via random 
encounters of the whole population under investigation. For an exhaustive 
survey of epidemic models refer to Pastor-Satorras et ah m- 

Henceforth, these models have been extended to include the structure 
and other aspects of the underlying networks [72 EH SB EH EH EZ], limiting 
the spread and outbreaks according to different factors. Further, epidemic 
models were also utilized to simulate the spread for a plethora of properties in 
different kinds of networks, such as viruses spreading in computer networks 
[Ml S2 IZQl 12 Ha] and information propagation (e.g., memes) ISH among 
others. 

In general, epidemic models are based on the intuition that a disease 
propagates through a social network with a given infection rate, defining the 
probability that a neighbor of an already infected node contracts the disease. 
Different models have been developed and analyzed to simulate epidemic 
outbreaks in a population or network [9] H EH ES] , which can only transfer 
on contact. Typically, such an outbreak is modeled using a small number of 
possible states for each node and a fixed probability of contraction (e.g., (3, 
7 ), which defines the probability or “threshold” that has to be reached for 
a node to change to a different state. For example, the SI model consists 
of only two states - susceptible and infected - and one probability param¬ 
eter f3, that determines when the transition from susceptible to infected is 
initiated. Note that transitions in the SI model can only occur from suscep- 
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tible to infected while already infected nodes remain infected indefinitely. As 
the infection rate is relative to the population under investigation, epidemic 
simulations with a small number of originally infected hosts usually start-off 
by slowly contracting the disease until exponential growth is reached. Once 
the majority of the population carries the disease, the infection process slows 
down again until the whole population is infected. 

A more sophisticated extension to the SI model is the SIR model [H |63] , 
which additionally introduces the recovered (or removed) state as well as an 
additional parameter 7 to model the transition from infected to recovered. 
Again, transitions only occur from susceptible to infected to recovered. As 
the name suggests, this newly introduced state allows nodes to become im¬ 
mune to the disease and will not be infected in the future, nor be able to 
infect other nodes. Other models for simulating epidemic outbreaks are the 
SIS and SIRS models, where the population can recover but does not become 
immune (SIS) or stays immune but still has a chance to become susceptible 
for infection again (SIRS) [ISl EH]- 

Since their introduction, epidemic models have seen a wide array of ap¬ 
plication. For example, to analyze how computer viruses spread miiSlET] 
or the study of epidemics in complex (scale-free, power-law) networks [701 
[m [721 [62]. 

Among others Wang et ah [SO] as well as Ganesh et ah [07] demonstrated 
the importance of the networks spectra (eigenvalues and eigenvectors of the 
network adjacency matrix) for epidemic and dynamical network models [2H 
Eg. We show a similar dependency of activity dynamics on eigenvalues in 
this paper in Section 

5.2.2 Collective Behavior &: Opinion Dynamics 

Another important held of application of dynamical systems on networks are 
opinion dynamics. They are used to model collective behavior and inhuence, 
usually in the form of a consensus-reaching task, at every point in time. The 
main idea behind the concept of social inhuence is that interacting agents 
strive to become more alike [33] . 

For example, agents in the Ising model for ferromagnets [ig [ig are in- 
huenced by the state/opinions of the majority of their peers. This inhuence 
naturally drives the system towards an ordered state where all agents are ei¬ 
ther positive or negative (ferromagnets). Hence, the model can be interpreted 
as a very simple model for simulating (binary) opinion dynamics. However, 
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the transition probabilities of the Ising model are influenced by temperature, 
representing the modeling of external or influential factors. In particular, if 
the temperature is above a certain threshold, consensus-hnding, in terms of 
magnetization, becomes an unstable process that never converges. The Potts 
model [HU ED] further extends the Ising model by increasing the number of 
potential states an agent can assume from two (positive or negative) to an 
arbitrary number greater than two. Other factors that might influence the 
process of reaching consensus is the size of the system under investigation 
|82j . In particular, this means that differently sized (or connected) systems 
potentially need different strategies to reach consensus. 

Opinions are usually represented as a set of words or numbers for each 
agent individually. Weidlich [90] introduced such a model, based on sociody¬ 
namics, in 1971. Galam et ah [361; Galam and Moscovici [35] analyzed the 
potential applications of the Ising model for simulating opinion dynamics 
starting in 1982. 

The most wide-spread and adapted models to simulate (among others) 
opinion dynamics are the voter model [26l[l0], the Axelrod model [8] as well 
as The Naming Game m- 

The voter model constitutes that each agent is equipped with a binary 
variable. At each step in time, the binary variable of one (randomly chosen) 
agent is synchronized with one of its neighbors variable. Introducing the 
concept of social influence for opinion dynamics. The voter model has since 
been adapted and extended by many researchers to £t an array of different 
purposes (e.g., [5911601 EUlHlllHalEl]). 

The Axelrod model [8] combines the notion of social influence - individ¬ 
uals becoming more similar upon frequent interactions - and the tendency 
that similar individuals will have a higher tendency (and frequency) to in¬ 
teract with each other. Each agent is endowed with a set of characterizing 
variables. The more variables are shared among two agents, the more similar 
they are. Given this description, one would assume that the described no¬ 
tions are self-reinforcing dynamics and hence, will inevitably produce stable 
networks with only identical agents. However, Gastellano et ah [IH] have 
shown that the resulting number of different states is dependent on the num¬ 
ber of characterizing variables. Large numbers are likely to result in very few 
similar individuals (high agent diversity). Analogously to the voter model, 
the Axelrod model has been extensively adapted, analyzed and expanded by 
researchers to broaden our understanding of the spread of (cultural) traits 
across agents (e.g., Klemm et ah [501119]; Flache and Macy H). 
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The Naming Game originates from idea to analyze and explore the evo¬ 
lution of language pg. Baronchelli et al. m introduced the most basic 
version of The Naming Game in 2006, where a group of agents that com¬ 
municate via a complete network, try to reach consensus when naming an 
entity. Each agent holds a list of synonyms or words associated with the 
entity, also referred to as vocabulary, under investigation. Every iteration 
(or step in time), two agents are chosen. One agent is assigned the role of 
the speaker, who randomly choses a word of a given/pre-dehned vocabu¬ 
lary. If the other agent - the listener - knows (i.e., also has the word in the 
vocabulary) the chosen word, both agents discard all other words in their 
vocabulary and “agree” on the common word. However, if the listeners do 
not know the word of the speaker, the word is appended to their vocabu¬ 
lary and no words are discarded. In the next step another pair of nodes is 
chosen and process is repeated until either consensus is found or a predeter¬ 
mined number of steps (time) have passed. The Naming Game has spurred 
a complete line of dynamical models with a variety of different parameters, 
that each address different problems and tasks (e.g., Abrams and Strogatz 
[1]; Minett and Wang [5H]; Wang and Minett [HE]; Gastello et al. [2T]). For 
an excellent and comprehensive introduction to opinion dynamics (among 
others) we refer the interested reader to Gastellano et al. [20] . 


6 Discussion, Limitations & Future Work 


We have developed a mode to simulate and characterize the intricate dy¬ 


namics of activity in collaboration networks, consisting of an Activity Decay 
Rate and Peer Influence Growth Rate. First, we applied it on Zachary’s 
Karate Glub (see Figure dataset to illustrate its core mechanics. Subse¬ 
quently, we continued with a linear stability analysis (cf. Section 2.2) and 
depicted the behavior that can occur when the master stability equation is 
invalidated (see Figure |^. Using our proposed model to simulate activity 
dynamics, we have shown that the overall activity in collaboration networks 
appears to be a composite of the Activity Decay Rate and the Peer Influ¬ 
ence Growth Rate, as described in Section In Section we have htted 
our model on synthetic and empirical datasets to simulate activity dynamics 


have released a Python implementation of our model, to estimate empirical pa¬ 
rameters and run activity dynamics simulations, as Open Source Software at https; 
//github.com/simonwalk/ActivityDynamics, 
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trends. 

The presented results are destined to be interpreted only and solely as 
an indicator for trends in activity dynamics, rather than absolute values 
that can be used for accurately predicting the activity for a given system. 
This is a direct result of the different approximations and simplihcations 
(cf. Section that we have made when estimating the parameters for our 
activity dynamics simulation. 

Note that one advantage of our model over other existing approaches, 
such as autoregression, is the interpretability of the ratio A//i. For example, 
a ratio of 4 means that users intrinsically lose activity 4 times faster than 
they can get back from one of their peers, while the coefficients of the autore¬ 
gression lack such interpretable characteristics. Further, using the concept of 
dynamical systems we can represent the underlying mechanisms in a closed 
form, allowing for detailed analytical analyses (i.e., the linear stability anal¬ 
ysis), which is much harder (if not impossible) to conduct for other models, 
such as agent-based models, autoregression or more complex models based 
on dynamical systems. 

For future work we plan on extending the ability of our model to not only 
reflect on changes in activity dynamics but also properly cope with structural 
changes in the underlying collaboration networks. One additional limitation 
of the presented approach is the fact that nodes with a very small degree, 
which are not connected to the largest connected component, inevitably will 
lose activity until they reach the point of total inactivity. Including the struc¬ 
tural evolution of a collaboration network in our analyses will allow us to 
mitigate this effect, as users will only be added to the collaboration network 
and considered in our calculations, once they have actually become active. 
One potential approach involves the investigation of snapshots of the collab¬ 
oration networks at every r, providing additional insights into the evolution 
of the parameters of our model and the investigated systems. Additionally, 
we assume that peer influence is a symmetric property. This means that 
posts and replies exercise the same amount of influence on peers as we do 
not differentiate between different types of activity and influence will always 
traverse along both directions of the edges in our collaboration networks. 
Further, tasks that do not trigger entries in the change-logs (i.e., reading 
articles, posts or replies) are not considered in our experiments due to a lack 
of available data. 

The fact that the Activity Dynamics Model only requires a single param¬ 
eter to be configured represents not only an advantage, but also a limitation. 


41 


Given that there is only one parameter that determines the evolution of ac¬ 
tivity in a system, we are not be able to model periodic fluctuations with 
only one ratio. Instead, we have to calculate ratios for multiple points in 
time. For future work we plan on extending the Activity Dynamics Model 
by adding parameters, for example, to model different external influences. 
With this extended model, we will be able to simulate such periodic patterns 
with a single configuration. On the other hand, we are only able to model 
additional (social) mechanisms with the use of additional parameters. For 
example, one reason for the decreasing levels of activity in Wikipedia might 
also be related to a very high barrier for newly registered users to add content 
due to comprehensive guidelines for contributions and a very concentrated 
and active community of power users. Over time, these power users leave 
Wikipedia for various reasons while new contributors are lacking to £11 in the 
gaps. 

Furthermore, all of our estimated parameters are calculated for the col¬ 
laboration networks as a whole. Future work will also include extending the 
activity dynamics model to calculate the ratio \/yt, on a user level, rather 
than on a network level. This modification not only potentially increases 
the accuracy of our model but would also allow us to gather additional in¬ 
formation for each user of the corresponding networks. Further, with an 
increased accuracy in our simulations it will be possible to conduct activ¬ 
ity prediction experiments and emulate network attacks as well as optimize 
(arbitrary) cost-strategies for increasing activity in these systems. 

In this context it is also worth mentioning that decreasing levels of activity 
for collaboration networks can also signal that the community has completed 
their work and no further actions are required as the intended goal has been 
achieved. Further analyses are required to determine if completeness and 
quality of content affect activity in collaboration networks. One could even 
argue that, once we are able to calculate \/y for each user, we could poten¬ 
tially observe the evolution of users and categorize different types of users in 
collaboration networks (e.g., early adopters or experienced users versus new 
and inexperienced users). 

The ratio \/y —describing how fast users lose activity {Activity Decay 
Rate A) over how fast they regains activity over their neighbors {Peer Influ¬ 
ence Growth Rate y) —fluctuates below the corresponding highest eigenvalue 
Ki for all investigated empirical datasets. Negative peaks in this ratio repre¬ 
sent periods of time (r; in our case weeks) where activity grew faster than 
could be compensated by the Peer Influence Growth Rate. It naturally fol- 
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lows that a decrease of A—resulting in less activity-loss per contribution for 
each user—is necessary to accomplish such drastic increases of activity. If 
the network itself is of a smaller scale and/or these negative peaks occur on 
a frequent basis, the activity dynamics of the corresponding networks are 
depending on the contributions (and thus influence) of single (individual) 
users. To compare the stability of the activity dynamics across multiple net¬ 
works we calculated the System Mass and Activity Momentum p —indicating 
the required force to accelerate or render the corresponding collaboration 
networks inactive. 

When comparing p and the results of our empirical illustration (cf. Fig¬ 
ures and 10) between the different datasets, we can see that the Activity 
Momentum is very small for datasets that either (i) exhibit only a very small 
number of changes and are close to inactivity or (ii) exhibit a small ki (see 
Figure]^ and 10). This suggests that we can use Activity Momentum as an 
indicator for the robustness of a collaboration network with regards to its 
activity dynamics. 

Further, we can characterize the potential of a collaboration network to 
become self-sustaining by comparing the calculated ratios of A/p with the 
corresponding ki and Activity Momentum. If the ratio is below ki, our mas¬ 
ter stability equation is invalidated, pushing the system towards a new hxed 
point where the forces of the Activity Decay Rate and the Peer Influence 
Growth Rate reach an equilibrium so that the network converges towards 
a state of immanent and lasting activity (see Figure [^. If such a state is 
reached and combined with a high Activity Momentum, the corresponding 
collaboration network has reached critical mass of activity and has become 
self-sustaining; no external impulses are required to keep the network ac¬ 
tive. Of course, in real world scenarios, activity will not last forever without 
providing additional incentives as interest (and thus activity) in a system 
potentially decays over time. As a consequence, this would hrst result in 
an increase of p and inevitably, with a sufficiently large p, the collabora¬ 
tion network would return to its stable fixed point, once our master stability 
equation holds again, and activity would once more converge towards zero. 
Once we extend our model to allow for user-based calculations, we will be 
able to not only calculate Activity Momentum for collaboration networks, 
but also for single and individual users. 
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